Preface

Open Rstudio to do the practicals. Note that tasks with * are optional.

R packages

In this practical, a number of R packages are used. The packages used (with versions that were used to generate the solutions) are:

  • survival (version: 3.3.1)

R version 4.2.1 (2022-06-23 ucrt)

Datasets

For this practical, we will use the heart and retinopathy data sets from the survival package. More details about the data sets can be found in:

https://stat.ethz.ch/R-manual/R-devel/library/survival/html/heart.html

https://stat.ethz.ch/R-manual/R-devel/library/survival/html/retinopathy.html

Combinations of Functions and Loops

Common R Objects

Task 1

Examine the following code in R.

h <- function(k){
       if (k <= 20){
        3 * k
      } else {
      2 * k
    }
}

Investigate what this code takes as an input (e.g. scalar, vector, matrix, array, data.frame, list) and what the output will be in every case.

Solution 1

h <- function(k){
       if (k <= 20){
        3 * k
      } else {
      2 * k
    }
}

# scalar:
scalar_test <- 10

# vector:
vec_test <- c(1:10)

# matrix:
mat_test <- matrix(data = 1:6, nrow = 3, ncol = 3)
## Warning in matrix(data = 1:6, nrow = 3, ncol = 3): data length differs from size of matrix: [6 != 3
## x 3]
#array:
array_test <- array(data = 1:5, dim = c(2, 2, 2))

# data frame:
df_test <- data.frame(v1 = 1:10, v2 = sample(0:1, 10, replace = TRUE))

# list:
list_test <- list(l1 = 1:30, l2 = c("m", "f"), l3 = c(TRUE, TRUE, FALSE, FALSE, FALSE))

# Test the function (after you remove the #):
# h(scalar_test)
# h(vec_test)
# h(mat_test)
# h(array_test)
# h(df_test)
# h(list_test)

Task 2

Examine the following code in R.

h <- function(k){
  for(i in 1:length(k)){
    k[i]<- if (k[i] <= 20){
      3 * k[i]
    } else {
      2 * k[i] 
    }
  }
  k
}

Investigate what this code takes as an input (e.g. scalar, vector, matrix, array, data.frame, list) and what the output will be in every case.

Solution 2

h <- function(k){
  for(i in 1:length(k)){
    k[i]<- if (k[i] <= 20){
      3 * k[i]
    } else {
      2 * k[i] 
    }
  }
  k
}

# scalar:
scalar_test <- 10

# vector:
vec_test <- c(1:10)

# matrix:
mat_test <- matrix(data = 1:6, nrow = 3, ncol = 3)
## Warning in matrix(data = 1:6, nrow = 3, ncol = 3): data length differs from size of matrix: [6 != 3
## x 3]
#array:
array_test <- array(data = 1:5, dim = c(2, 2, 2))

# data frame:
df_test <- data.frame(v1 = 1:10, v2 = sample(0:1, 10, replace = TRUE))

# list:
list_test <- list(l1 = 1:30, l2 = c("m", "f"), l3 = c(TRUE, TRUE, FALSE, FALSE, FALSE))

# h(scalar_test)
# h(vec_test)
# h(mat_test)
# h(array_test)
# h(df_test)
# h(list_test)

Task 3

Define a function (with the name mysummary) that takes one parameter (let’s call it x) and investigates if it is a data frame, a factor, or a numeric variable and returns the message “This is a data.frame” if x is a data frame, “This is a factor” is x is a factor and “This is numeric” if x is numeric.

Use an if statement. When you want to display results from the function you can use the print(…) function.

Solution 3

mysummary <- function(x){
  if(is.data.frame(x)){
    print('This is a data.frame')
  }else if(is.factor(x)){
    print('This is a factor')
  }else if(is.numeric(x)){
    print('This is numeric')
  }
}
df_test <- data.frame(v1 = 1:10, v2 = sample(0:1, 10, replace = TRUE))
vec1_test <- as.factor(c("y", "n", "n", "n"))
vec2_test <- 1:10
mysummary(df_test)
## [1] "This is a data.frame"
mysummary(vec1_test)
## [1] "This is a factor"
mysummary(vec2_test)
## [1] "This is numeric"

Data Exploration

Task 1

Extend the previous function mysummary to take again one parameter (let’s call it x) but to return also some descriptive statistics. We therefore modify the function in such a way that a frequency table is given for factors and the mean and standard deviation are given for numeric data. For a data frame, return only an extra message that indicates that this is not implemented yet.

Use the functions table(…), mean(…) and sd(…).

Solution 1

mysummary <- function(x){
  if(is.data.frame(x)){
    print('This is a data.frame')
    print('Not yet implemented')
  }else if(is.factor(x)){
    print('This is a factor')
    print(table(x))
  }else if(is.numeric(x)){
    print('This is numeric')
    print('mean')
    print(mean(x))
    print('sd')
    print(sd(x))
  }
}
df_test <- data.frame(v1 = 1:10, v2 = sample(0:1, 10, replace = TRUE))
vec1_test <- as.factor(c("y", "n", "n", "n"))
vec2_test <- 1:10
mysummary(df_test)
## [1] "This is a data.frame"
## [1] "Not yet implemented"
mysummary(vec1_test)
## [1] "This is a factor"
## x
## n y 
## 3 1
mysummary(vec2_test)
## [1] "This is numeric"
## [1] "mean"
## [1] 5.5
## [1] "sd"
## [1] 3.02765

Task 2*

The previous function mysummary still does not do anything useful when we apply it on the whole data.frame. Extend the function to take again one parameter (let’s call it x) but to return also some descriptive statistics for the data frame.

This can be solved in the following way: whenever the function is called using a data.frame as argument, we use a for loop to loop over its columns. Within this loop we let the function call itself but now using the column as argument.

Solution 2*

mysummary <- function(x){
  if(is.data.frame(x)){
    print('This is a data.frame')
    for(i in 1:ncol(x)){
      print(names(x)[i])
      mysummary(x[,i])
    }     
  }else if(is.factor(x)){
    print('This is a factor')
    print(table(x))
  }else if(is.numeric(x)){
    print('This is numeric')
    print('mean')
    print(mean(x))
    print('sd')
    print(sd(x))
  }
}
df1_test <- data.frame(v1 = 1:10, v2 = sample(0:1, 10, replace = TRUE))
df2_test <- data.frame(v1 = 1:10, v2 = sample(c("no", "yes"), 10, replace = TRUE))
mysummary(df1_test)
## [1] "This is a data.frame"
## [1] "v1"
## [1] "This is numeric"
## [1] "mean"
## [1] 5.5
## [1] "sd"
## [1] 3.02765
## [1] "v2"
## [1] "This is numeric"
## [1] "mean"
## [1] 0.2
## [1] "sd"
## [1] 0.421637
mysummary(df2_test)
## [1] "This is a data.frame"
## [1] "v1"
## [1] "This is numeric"
## [1] "mean"
## [1] 5.5
## [1] "sd"
## [1] 3.02765
## [1] "v2"

Data Manipulation

Task

Create a function with the name std_num that takes as input a data frame (let’s call it DF), standardizes all the numerical variables and returns the new data frame. Apply this function to the heart data set.

Use an if statement and a for loop. Note that you will have to transform some variables to factors first.

Solution

heart$event <- as.factor(heart$event)
heart$surgery <- as.factor(heart$surgery)
heart$transplant <- as.factor(heart$transplant)
heart$id <- as.factor(heart$id)

std_num <- function(DF){
     for (j in 1:dim(DF)[2]){
            if (is.numeric(DF[,j])){
             DF[,j] <- (DF[,j] - mean(DF[,j]))/sd(DF[,j])
            }
     }
      DF
}
std_num(heart)

Data Visualization

Task 1

Create a for loop that goes through the columns of the retinopathy data set and plots each column. If the column is a numerical variable then use a density plot, if the column is a categorical variable then use a barchart. Create a plot with multiple panels.

Use an if statement. Use the function par(mfrow = …) to create multiple plots. Note that you will have to transform some variables to factors first.

Solution 1

retinopathy$trt <- as.factor(retinopathy$trt)
retinopathy$status <- as.factor(retinopathy$status)

par(mfrow = c(3, 3))
for (j in 1:dim(retinopathy)[2]){
  if (is.numeric(retinopathy[,j])){
    plot(density(retinopathy[,j]), col = rgb(0,0,1,0.5))
    polygon(density(retinopathy[,j]), col = rgb(0,0,1,0.5), border = "blue")
  } else if (is.factor(retinopathy[,j])){
    plot(retinopathy[,j])
  }
}

Task 2

Create a function that applies the previous code in any data set. The name of the function will be plot_summary and it will take as input a data frame (let’s call it dt) and the dimension of the panel that includes the plots. In particular, dim_row represents the rows and dim_col the columns. The function will return a plot for each column. If the column is a numerical variable then use a density plot, if the column is categorical variable then use a barchart. Apply this function to the retinopathy data set.

Use an if statement and a for loop. Use the function par(mfrow = …) to create multiple plots. Note that you will have to transform some variables to factors first.

Solution 2

retinopathy$trt <- as.factor(retinopathy$trt)
retinopathy$status <- as.factor(retinopathy$status)

plot_summary <- function(dt, dim_row, dim_col){
  par(mfrow = c(dim_row, dim_col))
  for (j in 1:dim(dt)[2]){
    if (is.numeric(dt[,j])){
      plot(density(dt[,j]), col = rgb(0,0,1,0.5))
      polygon(density(dt[,j]), col = rgb(0,0,1,0.5), border = "blue")
    } else if (is.factor(dt[,j])){
      plot(dt[,j])
    }
  }
}
plot_summary(retinopathy, 3, 3)

Task 3*

Extend the previous function plot_summary to include also plot titles indicating the variable. Apply this function to the retinopathy data set.

Use the paste0(…) function.

Solution 3*

retinopathy$trt <- as.factor(retinopathy$trt)
retinopathy$status <- as.factor(retinopathy$status)

plot_summary <- function(dt, dim_row, dim_col){
  par(mfrow = c(dim_row, dim_col))
  for (j in 1:dim(dt)[2]){
    if (is.numeric(dt[,j])){
      plot(density(dt[,j]), col = rgb(0,0,1,0.5), main = paste0(colnames(dt)[j]))
      polygon(density(dt[,j]), col = rgb(0,0,1,0.5), border = "blue")
    } else if (is.factor(dt[,j])){
      plot(dt[,j], main = paste0(colnames(dt)[j]))
    }
  }
}
plot_summary(retinopathy, 3, 3)

Subsetting

Task 1

For each data set (heart and retinopathy), select only the observations that report an event.

Note that the variable name of the events is diffent for the two data sets.

Solution 1

heart[heart$event == 1, ]
retinopathy[retinopathy$status == 1, ]

Task 2

Create a function with the name dt_events that takes as input a list with data sets (let’s call it dt_list) and a vector indicating the name of the event variable for each data set (let’s call it event_name). The function will return a list consisting of each data set but including only the observations/rows that had an event. Create a list that consists of the data sets heart and retinopathy and apply the function.

Use a for loop. Note that the variable name of the events is different for the two data sets.

Solution 2

dt_events <- function(dt_lists, event_name){
  new_dt <- list()
  for (k in 1:length(dt_lists)) {
      dt <- dt_lists[[k]]
      new_dt[[k]] <- dt[dt[[event_name[k]]] == 1, ]
  }
  print(new_dt)
}

dt_events(dt_lists = list(heart, retinopathy), event_name = c("event", "status"))
## [[1]]
##     start stop event          age        year surgery transplant  id
## 1     0.0   50     1 -17.15537303  0.12320329       0          0   1
## 2     0.0    6     1   3.83572895  0.25462012       0          0   2
## 4     1.0   16     1   6.29705681  0.26557153       0          1   3
## 6    36.0   39     1  -7.73716632  0.49007529       0          1   4
## 7     0.0   18     1 -27.21423682  0.60780287       0          0   5
## 8     0.0    3     1   6.59548255  0.70088980       0          0   6
## 10   51.0  675     1   2.86926762  0.78028747       0          1   7
## 11    0.0   40     1  -2.65023956  0.83504449       0          0   8
## 12    0.0   85     1  -0.83778234  0.85694730       0          0   9
## 14   12.0   58     1  -5.49760438  0.86242300       0          1  10
## 16   26.0  153     1  -0.01916496  0.87337440       0          1  11
## 17    0.0    8     1   5.19370294  0.96372348       0          0  12
## 19   17.0   81     1   6.57357974  0.96919918       0          1  13
## 21   37.0 1387     1   6.01232033  0.97193703       0          1  14
## 22    0.0    1     1   5.81519507  0.99110198       1          0  15
## 24   28.0  308     1   1.44832307  1.07049966       0          1  16
## 25    0.0   36     1 -27.66872005  1.07597536       0          0  17
## 27   20.0   43     1   8.84873374  1.08692676       0          1  18
## 28    0.0   37     1  11.12388775  1.13347023       0          0  19
## 30   18.0   28     1   7.27994524  1.33059548       0          1  20
## 32    8.0 1032     1  -4.65708419  1.33880903       0          1  21
## 34   12.0   51     1  -5.21560575  1.46201232       0          1  22
## 36    3.0  733     1  10.35728953  1.52772074       0          1  23
## 38   83.0  219     1   3.80013689  1.56605065       0          1  24
## 42    0.0  263     1 -39.21423682  1.59069131       0          0  27
## 44   71.0   72     1   6.02327173  1.68377823       0          1  28
## 45    0.0   35     1   2.43394935  1.78507871       0          0  29
## 47   16.0  852     1  -3.08829569  1.88364134       0          1  30
## 48    0.0   16     1   6.88569473  1.89459274       0          0  31
## 50   17.0   77     1  16.40793977  1.91101985       0          1  32
## 55    0.0   12     1  -4.53388090  2.30800821       0          0  35
## 57   46.0  100     1   0.92539357  2.50787132       0          1  36
## 59   19.0   66     1  13.50034223  2.56536619       0          1  37
## 61    4.5    5     1  -6.52977413  2.59274470       0          1  38
## 63    2.0   53     1   2.51882272  2.63381246       0          1  39
## 68    0.0    3     1 -11.55920602  2.88843258       0          0  42
## 69    0.0    2     1  -4.60780287  3.05817933       1          0  43
## 70    0.0   40     1  -5.42094456  3.16495551       1          0  44
## 72    1.0   45     1 -11.81656400  3.26351814       0          1  45
## 74    2.0  996     1   0.61054073  3.27720739       1          1  46
## 76   21.0   72     1  -0.90075291  3.34017796       0          1  47
## 77    0.0    9     1   8.03559206  3.34839151       0          0  48
## 81   83.0  980     1  -2.11362081  3.37577002       1          1  50
## 83   32.0  285     1   0.73374401  3.47707050       0          1  51
## 84    0.0  102     1  -6.75154004  3.56468172       0          0  52
## 86   41.0  188     1  -0.65708419  3.75085558       0          1  53
## 87    0.0    3     1  -0.20807666  3.75085558       0          0  54
## 89   10.0   61     1   4.45448323  3.85489391       0          1  55
## 92    0.0  149     1  -6.73511294  3.95071869       0          0  57
## 94   21.0  343     1   0.01642710  3.97809719       1          1  58
## 98    3.0   68     1   1.05407255  4.13141684       0          1  60
## 99    0.0    2     1   4.56399726  4.17522245       0          0  61
## 100   0.0   69     1  -8.64613279  4.18891170       0          0  62
## 104  33.0  584     1   0.81587953  4.33675565       1          1  64
## 106  12.0   78     1   3.29363450  4.42984257       0          1  65
## 107   0.0   32     1   5.21286790  4.46817248       0          0  66
## 109  57.0  285     1 -28.44900753  4.47638604       0          1  67
## 111   3.0   68     1  -2.75975359  4.51745380       0          1  68
## 115   5.0   30     1   5.00205339  4.71184120       0          1  70
## 121  27.0   90     1   8.33127995  4.94729637       0          1  73
## 123   5.0   17     1 -18.83367556  4.96646133       0          1  74
## 124   0.0    2     1   4.18069815  4.99657769       0          0  75
## 127   0.0   21     1  -6.88843258  5.01574264       0          0  77
## 131  67.0   96     1   5.78234086  5.16632444       0          1  79
## 138  32.0   80     1   5.30869268  5.31690623       0          1  83
## 140  37.0  334     1  -5.28131417  5.33333333       0          1  84
## 141   0.0    5     1  -0.01916496  5.35249829       0          0  85
## 145  60.0  110     1  -1.74674880  5.47022587       0          1  87
## 149 139.0  207     1   3.04722793  5.51129363       0          1  89
## 151 160.0  186     1   4.03285421  5.51403149       1          1  90
## 152   0.0  340     1  -0.40520192  5.53319644       0          0  91
## 158   4.0  165     1  -4.15879535  5.95482546       1          1  94
## 160   2.0   16     1  -7.71800137  5.97672827       0          1  95
## 167   0.0   21     1   1.83436003  6.23408624       0          0  99
## 172   0.0    6     1  -8.68446270 -0.04928131       0          0 103
## 
## [[2]]
##       id laser   eye age     type trt futime status risk
## 4     14 argon right  12 juvenile   0  31.30      1    6
## 10    29 xenon  left  13 juvenile   0   0.30      1   10
## 12    46 xenon right  12 juvenile   0  54.27      1    9
## 14    49 argon right   8 juvenile   0  10.80      1    6
## 20    71 argon right  21    adult   0  13.83      1    9
## 21   100 argon  left  23    adult   1  46.43      1    9
## 24   112 argon right  44    adult   0   7.90      1   12
## 27   127 xenon right  48    adult   1  30.83      1    6
## 28   127 xenon right  48    adult   0  38.57      1   10
## 30   133 argon right  26    adult   0  14.10      1    9
## 31   150 argon  left  10 juvenile   1  20.17      1    9
## 32   150 argon  left  10 juvenile   0   6.90      1   10
## 34   167 argon  left  23    adult   0  41.40      1    9
## 44   214 xenon right  45    adult   0   0.60      1   12
## 45   220 argon right  11 juvenile   1  10.27      1   10
## 46   220 argon right  11 juvenile   0   1.63      1   10
## 49   255 xenon  left  10 juvenile   1   5.67      1   12
## 50   255 xenon  left  10 juvenile   0  13.83      1    9
## 52   264 argon right  12 juvenile   0  29.97      1   11
## 54   266 argon right  36    adult   0  26.37      1   11
## 55   284 argon  left  53    adult   1   5.77      1   10
## 56   284 argon  left  53    adult   0   1.33      1   12
## 57   295 xenon  left  10 juvenile   1   5.90      1   11
## 58   295 xenon  left  10 juvenile   0  35.53      1   11
## 59   300 xenon  left  25    adult   1  25.63      1   10
## 60   300 xenon  left  25    adult   0  21.90      1    9
## 61   302 argon  left  14 juvenile   1  33.90      1    9
## 62   302 argon  left  14 juvenile   0  14.80      1    9
## 63   315 argon  left  16 juvenile   1   1.73      1   10
## 64   315 argon  left  16 juvenile   0   6.20      1    8
## 66   324 xenon  left  38    adult   0  22.00      1    8
## 69   335 argon  left  10 juvenile   1  30.20      1    9
## 70   335 argon  left  10 juvenile   0  22.00      1   11
## 73   349 xenon right  44    adult   1  25.80      1   11
## 74   349 xenon right  44    adult   0  13.87      1    8
## 75   357 argon  left  21    adult   1   5.73      1    9
## 76   357 argon  left  21    adult   0  48.30      1    9
## 79   385 argon right  13 juvenile   1   1.90      1   10
## 81   396 argon  left  40    adult   1   9.90      1   10
## 82   396 argon  left  40    adult   0   9.90      1    9
## 86   409 argon  left  48    adult   0   2.67      1   12
## 88   419 xenon right  42    adult   0  13.83      1   10
## 90   429 xenon right  24    adult   0   4.27      1   10
## 92   433 argon  left  55    adult   0  13.90      1   10
## 97   468 argon right   6 juvenile   1   1.70      1   10
## 98   468 argon right   6 juvenile   0   1.70      1    6
## 99   480 xenon right  19 juvenile   1   1.77      1    9
## 100  480 xenon right  19 juvenile   0  43.03      1   12
## 105  503 xenon  left  27    adult   1   8.30      1    9
## 106  503 xenon  left  27    adult   0   8.30      1   10
## 108  515 argon right  43    adult   0  18.43      1    9
## 112  538 argon  left  45    adult   0  31.63      1    9
## 115  550 xenon  left   3 juvenile   1  18.70      1   10
## 116  550 xenon  left   3 juvenile   0   6.53      1   11
## 120  557 argon right  13 juvenile   0  22.23      1    8
## 122  561 xenon right  15 juvenile   0  14.00      1    9
## 123  568 argon  left  10 juvenile   1  42.17      1    9
## 124  568 argon  left  10 juvenile   0  42.17      1   10
## 126  572 xenon  left   6 juvenile   0   5.33      1    9
## 128  576 xenon  left  17 juvenile   0  59.80      1   10
## 130  581 argon  left  37    adult   0   5.83      1   11
## 132  606 xenon right  18 juvenile   0   2.17      1   12
## 133  610 xenon  left  13 juvenile   1  14.30      1    9
## 134  610 xenon  left  13 juvenile   0  48.43      1    8
## 141  631 argon  left  11 juvenile   1  13.33      1   12
## 142  631 argon  left  11 juvenile   0   9.60      1   10
## 147  653 xenon  left  15 juvenile   1  14.27      1    9
## 148  653 xenon  left  15 juvenile   0   7.60      1   12
## 149  662 argon  left   7 juvenile   1  34.57      1   12
## 150  662 argon  left   7 juvenile   0   1.80      1   12
## 152  664 argon  left   2 juvenile   0   4.30      1   12
## 153  683 xenon  left  22    adult   1   4.10      1    8
## 154  683 xenon  left  22    adult   0  12.20      1    6
## 160  706 xenon right  27    adult   0  12.73      1   10
## 162  717 xenon right  53    adult   0  54.10      1   11
## 164  722 xenon right  10 juvenile   0   9.40      1   12
## 165  731 xenon  left  13 juvenile   1  21.57      1   12
## 166  731 xenon  left  13 juvenile   0   9.90      1   10
## 181  778 xenon  left  25    adult   1  26.23      1    8
## 184  780 argon right  15 juvenile   0  18.03      1   11
## 189  804 xenon right  23    adult   1   7.07      1   10
## 191  810 xenon  left  13 juvenile   1  13.77      1   10
## 192  810 xenon  left  13 juvenile   0  13.77      1   12
## 194  815 xenon  left  45    adult   0   9.63      1   12
## 198  834 argon  left   8 juvenile   0   1.50      1   12
## 199  838 xenon  left  30    adult   1  33.63      1    9
## 200  838 xenon  left  30    adult   0  33.63      1    9
## 203  866 argon  left  39    adult   1  63.33      1   11
## 204  866 argon  left  39    adult   0  27.60      1   10
## 205  887 argon right  26    adult   1  38.47      1   10
## 206  887 argon right  26    adult   0   1.63      1   10
## 210  910 xenon  left  34    adult   0  25.30      1   10
## 212  920 xenon  left  10 juvenile   0  46.20      1    8
## 214  925 argon right  40    adult   0   1.70      1   10
## 221  949 xenon right  13 juvenile   1  10.33      1    9
## 222  949 xenon right  13 juvenile   0   0.83      1   10
## 223  952 argon  left   9 juvenile   1   6.13      1   12
## 226  962 argon  left   5 juvenile   0  25.93      1    9
## 230  971 argon right  23    adult   0  19.40      1   10
## 232  978 argon  left   2 juvenile   0  21.97      1   10
## 235  987 xenon  left   7 juvenile   1  26.20      1   10
## 238 1002 argon  left  13 juvenile   0  18.03      1   11
## 239 1017 xenon  left  50    adult   1  13.83      1    8
## 240 1017 xenon  left  50    adult   0   1.57      1    9
## 242 1029 xenon right  20    adult   0  13.37      1   12
## 243 1034 argon  left  15 juvenile   1  11.07      1   11
## 244 1034 argon  left  15 juvenile   0   1.97      1    9
## 246 1037 xenon right  30    adult   0  22.20      1    9
## 251 1074 xenon  left   4 juvenile   1   6.10      1   10
## 253 1098 argon  left   3 juvenile   1   2.10      1   10
## 254 1098 argon  left   3 juvenile   0  11.30      1   11
## 255 1102 argon right  10 juvenile   1  17.73      1   10
## 265 1145 argon  left  44    adult   1  12.93      1   10
## 266 1145 argon  left  44    adult   0   4.97      1    9
## 268 1148 argon  left  48    adult   0  26.47      1   12
## 271 1184 argon  left  46    adult   1  24.43      1   12
## 272 1184 argon  left  46    adult   0   9.87      1   11
## 275 1205 xenon  left  12 juvenile   1  13.97      1   11
## 276 1205 xenon  left  12 juvenile   0  30.40      1   12
## 278 1213 xenon right  12 juvenile   0  43.33      1   10
## 288 1267 xenon right  50    adult   0  38.57      1   11
## 290 1281 xenon right  44    adult   0   3.67      1   10
## 291 1287 argon  left   8 juvenile   1  48.87      1   10
## 296 1296 xenon  left  18 juvenile   0  15.83      1   10
## 298 1309 xenon right  56    adult   0   8.83      1    8
## 304 1321 xenon right   5 juvenile   0  22.13      1    9
## 305 1333 argon right   1 juvenile   1   6.30      1    9
## 308 1347 xenon right   1 juvenile   0  18.93      1   10
## 309 1361 xenon  left  14 juvenile   1  13.80      1   10
## 310 1361 xenon  left  14 juvenile   0  19.00      1    9
## 313 1373 argon  left   8 juvenile   1  13.57      1   11
## 314 1373 argon  left   8 juvenile   0   5.43      1   11
## 320 1413 argon right   3 juvenile   0   7.10      1   12
## 322 1425 argon  left  35    adult   0  26.17      1   11
## 324 1447 argon  left   8 juvenile   0  24.73      1   11
## 328 1469 xenon right  51    adult   0  10.97      1   12
## 330 1480 argon  left  42    adult   0  21.10      1    9
## 332 1487 argon right  20    adult   0  43.70      1    8
## 336 1499 xenon right  22    adult   0  14.37      1    9
## 343 1533 argon  left  41    adult   1   1.50      1    9
## 346 1537 xenon right  19 juvenile   0  38.40      1   10
## 348 1552 xenon right   4 juvenile   0   2.83      1   11
## 352 1562 xenon right  20    adult   0   6.57      1    9
## 354 1572 xenon right  24    adult   0  46.27      1    9
## 359 1596 argon right  16 juvenile   1  34.37      1    9
## 366 1619 xenon  left  20    adult   0  61.83      1   12
## 367 1627 xenon  left  10 juvenile   1   6.57      1   10
## 369 1636 argon right  16 juvenile   1  38.87      1    6
## 371 1640 xenon  left  10 juvenile   1  42.43      1   11
## 372 1640 xenon  left  10 juvenile   0  46.63      1    9
## 375 1649 argon right   1 juvenile   1   2.70      1   10
## 383 1688 xenon  left   5 juvenile   1  13.10      1   11
## 388 1717 argon  left  22    adult   0  42.33      1   11
## 390 1727 argon right  33    adult   0   2.90      1   10
## 392 1746 argon right   3 juvenile   0   1.43      1   10

The Apply family

Task 1

For each data set (heart and retinopathy), obtain the mean age per event status.

Solution 1

tapply(heart$age, heart$event, mean)
##         0         1 
## -3.421947 -1.270983
tapply(retinopathy$age, retinopathy$status, mean)
##        0        1 
## 20.35146 21.44516

Task 2

Create a function with the name summary_list that takes as input a list with data sets (let’s call it dt_list), a vector with the names of the continuous variables (let’s call it dt_cont) and a vector with the names of the categorical variables (let’s call it dt_cat) and returns the mean of the continuous variable per group (categorical variable) for each data set. Apply the function to the heart and retinopathy data sets. In particular, use the continuous variables year (from the heart data set) and risk (from the retinopathy data set) and the categorical variables surgery (from the heart data set) and type (from the retinopathy data set).

Use a for loop. Use the print(…) function to return the results.

Solution 2

summary_list <- function(dt_list, dt_cont, dt_cat){
  for (i in 1:length(dt_list)){
    dt <- dt_list[[i]]
    print(tapply(dt[[dt_cont[i]]], dt[[dt_cat[i]]], mean))
  }
}

summary_list(dt_list = list(heart, retinopathy), dt_cont = c("year", "risk"), dt_cat = c("surgery", "type"))
##        0        1 
## 3.320975 4.105738 
## juvenile    adult 
## 9.745614 9.632530

Task 3*

Extend the previous by including an extra input argument (let’s call it calc) which will indicate each time the measure that we want to calculate e.g, the mean. So that the user can decide which function to use.

Solution 3*

summary_list <- function(dt_list, dt_cont, dt_cat, calc){
  for (i in 1:length(dt_list)){
    dt <- dt_list[[i]]
    print(tapply(dt[[dt_cont[i]]], dt[[dt_cat[i]]], calc))
  }
}

summary_list(dt_list = list(heart, retinopathy), dt_cont = c("year", "risk"), dt_cat = c("surgery", "type"), calc = median)
##        0        1 
## 3.564682 3.978097 
## juvenile    adult 
##       10        9
 

© Eleni-Rosalina Andrinopoulou